Fluid benchmark support recordio reader #11121

typhoonzero · 2018-06-01T09:37:30Z

This can also fix the issue when running with --gpus > 1

Yancey1989 · 2018-06-04T06:55:10Z

benchmark/fluid/models/mnist.py

-    label = fluid.layers.data(name='label', shape=[1], dtype='int64')
+    if args.use_reader_op:
+        filelist = [
+            os.path.join(args.data_path, f) for f in os.listdir(args.data_path)


We can use glob to specify the files.

Yancey1989 · 2018-06-04T06:58:33Z

benchmark/fluid/README.md

+and batch_size you choose:
+
+```bash
+python -c 'from recordio_converter import *; prepare_mnist("data", 32)'


It's better to set batch_size=1, we can set the batch_size in the trainer reader.

… fluid_benchmark_support_recordioreader

chengduoZH · 2018-06-05T11:49:09Z

benchmark/fluid/fluid_benchmark.py


    iters, num_samples, start_time = 0, 0, time.time()
    for pass_id in range(args.pass_num):
        train_losses = []
-        for batch_id, data in enumerate(train_reader()):
+        reader_generator = train_reader()


reader_generator = train_reader() ==>

if not args.use_reader_op: reader_generator = train_reader()

chengduoZH · 2018-06-05T11:51:39Z

benchmark/fluid/fluid_benchmark.py

-            num_samples += len(data)
+            batch_id += 1
+            # FIXME(wuyi): last batch size maybe different
+            num_samples += len(args.batch_size)


For use_reader_op, if the current pass is not the last, the last batch of this pass is also equal to args.batch_size.

chengduoZH · 2018-06-05T11:52:47Z

benchmark/fluid/fluid_benchmark.py

    for pass_id in range(args.pass_num):
        num_samples = 0
        iters = 0
        start_time = time.time()
-        for batch_id, data in enumerate(train_reader()):
+        reader_generator = train_reader()


reader_generator = train_reader() ==>

if not args.use_reader_op: reader_generator = train_reader()

chengduoZH · 2018-06-05T11:59:33Z

benchmark/fluid/models/mnist.py

+            thread_num=args.gpus)
+        data_file = fluid.layers.double_buffer(
+            fluid.layers.batch(
+                data_file, batch_size=args.batch_size))


For use_reader_op, the batch_size of fluid.layers.batch is set with a single card, this is to say if the batch size is 256 when training Vgg, and the machine has 4 cards, the batch_size for fluid.layers.batch should be 64.

chengduoZH · 2018-06-05T12:01:19Z

benchmark/fluid/models/vgg.py

+            thread_num=args.gpus)
+        data_file = fluid.layers.double_buffer(
+            fluid.layers.batch(
+                data_file, batch_size=args.batch_size))


Same as above.

chengduoZH · 2018-06-05T12:12:02Z

benchmark/fluid/fluid_benchmark.py

@@ -296,9 +331,10 @@ def train_parallel(avg_loss, infer_prog, optimizer, train_reader, test_reader,
            if iters == args.skip_batch_num:
                start_time = time.time()
                num_samples = 0
-            if iters == args.iterations:
+            # NOTE: if use reader ops, the input data is not splited to multiple cards
+            if args.use_reader_op and iters >= args.iterations / args.gpus:


I don't think iters >= args.iterations / args.gpus is appropriate.
Because the model's accuracy is highly related to the new parameters that have learned, but the new parameters may be related to the times of updating parameter. So maybe we should not do that.

Well, args.iterations is intended to let the benchmark finish fast, no concerns for model accuracy. To run a full model training, we can set args.iterations to -1 so that it can run until all train data have been fed.

… fluid_benchmark_support_recordioreader

chengduoZH · 2018-06-06T06:48:42Z

benchmark/fluid/fluid_benchmark.py

@@ -266,7 +266,10 @@ def train(avg_loss, infer_prog, optimizer, train_reader, test_reader, batch_acc,
            # FIXME(wuyi): For use_reader_op, if the current
            # pass is not the last, the last batch of this pass
            # is also equal to args.batch_size.
-            num_samples += len(args.batch_size)
+            if args.use_reader_op:
+                num_samples += args.batch_size


args.batch_size is the batch size on each GPU now. So it should be num_samples += args.batch_size * args.gpus.

Thanks very much! Done. Current know issue, if set --use_reader_op we must also set --no_test will fix this in next PR.

… fluid_benchmark_support_recordioreader

fluid benchmark support recordio reader

47630a4

typhoonzero requested review from JiayiFeng, Yancey1989 and luotao1 June 1, 2018 11:22

Yancey1989 reviewed Jun 4, 2018

View reviewed changes

luotao1 mentioned this pull request Jun 4, 2018

refine benchmark/fluid #11158

Merged

yi.wu added 3 commits June 5, 2018 15:37

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

a28a462

… fluid_benchmark_support_recordioreader

update readme

2e0d851

update

725ea3f

typhoonzero requested a review from chengduoZH June 5, 2018 08:08

chengduoZH reviewed Jun 5, 2018

View reviewed changes

yi.wu added 3 commits June 6, 2018 13:02

follow comments

8d14b39

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

8893cf1

… fluid_benchmark_support_recordioreader

update

6fdd5de

chengduoZH reviewed Jun 6, 2018

View reviewed changes

yi.wu added 3 commits June 6, 2018 19:03

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

29b32b8

… fluid_benchmark_support_recordioreader

fix errors by comment

8370c5a

add num_passes

cd33057

This was referenced Jun 7, 2018

fluid_benchmark 运行加入----use_reader_op参数，启动psserver失败 #11259

Closed

执行脚本参数iteration 超过recordio文件提供的batch数，出core #11265

Closed

chengduoZH approved these changes Jun 7, 2018

View reviewed changes

typhoonzero merged commit 635099c into PaddlePaddle:develop Jun 7, 2018

typhoonzero deleted the fluid_benchmark_support_recordioreader branch June 7, 2018 12:27

chengduoZH mentioned this pull request Jun 11, 2018

Refine fluid_benchmark.py #11118

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fluid benchmark support recordio reader #11121

Fluid benchmark support recordio reader #11121

typhoonzero commented Jun 1, 2018

Yancey1989 Jun 4, 2018

Yancey1989 Jun 4, 2018

typhoonzero Jun 5, 2018

chengduoZH Jun 5, 2018

chengduoZH Jun 5, 2018

chengduoZH Jun 5, 2018

chengduoZH Jun 5, 2018

chengduoZH Jun 5, 2018

chengduoZH Jun 5, 2018

typhoonzero Jun 6, 2018

chengduoZH Jun 6, 2018 •

edited

Loading

typhoonzero Jun 6, 2018

Fluid benchmark support recordio reader #11121

Fluid benchmark support recordio reader #11121

Conversation

typhoonzero commented Jun 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chengduoZH Jun 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chengduoZH Jun 6, 2018 •

edited

Loading